Construction and analysis of pseudogene-related ceRNA network in breast cancer

Breast cancer (BC) is one of the leading causes of cancer-related deaths in women. The present study explored the potential role of pseudogenes in BC via construction and analysis of a competing endogenous RNA (ceRNA) network through a three-step process. First, we screened differentially expressed genes in nine BC datasets. Then the gene-pseudogenes pairs (nine hub genes) were selected according to the functional enrichment and correlation analysis. Second, the candidate hub genes and interacting miRNAs were used to construct the ceRNA network. Further analysis of the ceRNA network revealed a crucial ceRNA module with two genes-pseudogene pairs and two miRNAs. The in-depth analysis identified the GBP1/hsa-miR-30d-5p/GBP1P1 axis as a potential tumorigenic axis in BC patients. In the third step, the GBP1/hsa-miR-30d-5p/GBP1P1 axis expression level was assessed in 40 tumor/normal BC patients and MCF-7 cell lines. The expression of GBP1 and GBP1P1 was significantly higher in the tumor compared to the normal tissue. However, the expression of hsa-miR-30d-5p was lower in tumor samples. Then, we introduced the GBP1P1 pseudogene into the MCF-7 cell line to evaluate its effect on GBP1 and hsa-miR-30d-5p expression. As expected, the GBP1 level increased while the hsa-miR-30d-5p level decreased in the GBP1P1-overexprsssing cell line. In addition, the oncogenic properties of MCF-7 (cell viability, clonogenicity, and migration) were improved after GBP1P1 overexpression. In conclusion, we report a ceRNA network that may provide new insight into the role of pseudogenes in BC development.


RNAseq data processing and DEG analysis
The FASTQC tool (https:// qubes hub.org/ resou rces/ fastqc) is employed to check raw data quality, including sequence per base quality and sequence length distribution.All reads with a Phred score lower than 30 were discarded.In addition, sequence length distribution was checked before processing data.The TRIMGALORE (https:// www.bioin forma tics.babraham.ac.uk/projects/trim galore) was utilized to exclude adapters and lowquality bases.The HISAT2 (http:// Daehw anKim Lab.github.io /hisat2) was used for aligning to the human reference genome (GRCh38).In addition, the annotation file was downloaded from the human genome browser at UCSC.Differentially expressed genes were found using the Deseq2 package (https:// bioco nduct or.org/ packa ges/ DESeq2).We exclude all the reads except the results with an adj.p-value < 0.01.The median of expression values of the DEGs was then clustered using the k-means method.Pearson's correlation coefficient for genes and pseudogenes was calculated for co-expression analysis.

Functional enrichment
To further investigate the potential functions of DGEs, the BioPlanet (https:// tripod.nih.gov/ biopl anet/) version 19 was used.Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed on 20 DGEs to explore the key biological functions and pathways of candidate genes.For each dataset we searched for the concordance degrees in the top GO terms.The GO terms included the Biological Process (BP), Cellular Component (CC) and Molecular Function (MF) categories.GO keywords and KEGG

Construction of ceRNA network and in-depth analysis
The candidate hub gene-pseudogene pairs and interacting miRNA were used to construct the ceRNA network.A matrix plot of correlations between gene-pseudogene pairs and miRNAs was used to construct a tripartite ceRNA network.The interaction of gene-miRNA-pseudogene was demonstrated by an alluvial plot.
Based on the constructed ceRNA network, an in-depth analysis was performed to find the core module of the ceRNA network.The gene-pseudogene-miRNA interaction scores were estimated according to the expression patterns and correlations.The inclusion criteria were a simultaneous negative correlation between miRNA expression and gene-pseudogene pairs in BC.Clinical information (including age, menopause status, metastasis, tumor stage, and tumor subclass) was used to compare gene-miRNA-pseudogene axes in the core ceRNA module.

Experimental validation BC sample collection
For experimental validation, we collected 80 BC samples (40 breast tumors and 40 surrounding tumor-free margins) during surgery.The sample collection and experimental procedures were approved by the Golestan University of Medical Sciences Ethics Committee (IR.GOUMS.REC.1398.009).All experiments were performed in accordance with relevant guidelines and regulations.The MCF7 cell line was used for GBP1P1 cloning and expression analysis.

Expression of GBP1/hsa-miR-30d-5p/GBP1P1 axis in BC patients
Total RNA was extracted using TRIZOL (Gibco, Life Technologies, Carlsbad, CA, U.S.A), and cDNA was synthesized using the first strand cDNA synthesis kit (Thermo Fisher Scientific, Waltham, MA, U.S.A) from 1 µg of RNA.Relative levels of GBP1 and GBP1P1 expression were assessed using quantitative real-time PCR.The PSMB2 was used as an internal control.The sequence of primers is shown in Supplementary table S1.The PCR conditions were: 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s and 60 °C for 30 s. Relative levels of expression were calculated using the calibrator-normalized method.

MCF-7 cell culture and GBP1P1 overexpression
The MCF7 cells were cultured in a DMEM medium with 10% FBS (Biosera, Shanghai, China).The pEGFP-C1 plasmid, harboring GBP1P1 pseudogene, was introduced into the MCF-7 for GBP1P1 overexpression.First, the vector was transformed into a competent DH5-α cell.The calcium phosphate method was used for transfection.Briefly, MCF-7 cells were plated onto 6-well plates 4-6 h before transfection.3-5 µg of plasmid were mixed in HBS (2X) and calcium chloride 2 M solutions and filter sterilized.The mixture was sprinkled slowly on the cells and incubated at 37 °C.The transfected cells were examined under fluorescence microscopy after 24 and 48 h.Cell cultures were treated with 400 µg/ml of Neomycin G418 (Sigma-Aldrich, St. Louis, USA) for three days and harvested for further evaluation.The MCF7 cell transfected with empty pEGFP-C1 vector was used as control.

Flow cytometry and viability assay
For cell viability assay, the pseudogene-harboring MCF-7 cells were plated onto 96-well plates and incubated overnight at 37 °C.After 24 h, 20 μl of MTS reagent was added, and plates were incubated for 4 h at 37 °C.The survival of cells was calculated according to the percentage of cell proliferation.
Flow cytometry was used to evaluate the early apoptosis in pseudogene-containing cells.Briefly, 1 × 10 6 cells were seeded in 6-well plates for 48 h.Then, cells were washed with ice-cold PBS, trypsinized, and centrifuged at 4 °C for 4 min.Then 5 μl Annexin V-FITC (Invitrogen, USA) and 5 μl propidium iodide (PI, Invitrogen, Carlsbad, CA, USA) were added into each well.After 15 min, 400 μl cold binding buffer was added.Finally, the early apoptotic rate was measured using BD Accuri C6 Flow Cytometer (BD Biosciences, Dubai, UAE).

Wound healing assay
The scratch test was performed to test the migration potential of GBP1P1-harboring cells.Briefly, cells were plated in 12-well plates, and a scratch with a pipet tip was made through the plate.Cells were washed with PBS, then RPMI 1640 medium was added in each well.The cells were photographed at 0, 6, 12, 18, and 24 h after wounding.Images were analyzed using TScratch software 34 .

Statistical analysis
To estimate the difference between groups, various statistical tests, including paired t-test, Wilcoxon test, and Kruskal-Wallis test, were used for relevant analysis.For DEG profiles, the differences between groups were calculated according to p-values and false discovery rate (FDR).The ceRNA network is visualized by RAWGraph 2.0 (https:// app.rawgr aphs.io/) and Cytoscape 3.10.0 35.The qPCR data were analyzed using the calibrator-normalized method.Data are expressed as mean ± standard error of the mean (SEM).All statistical analyses were performed using GraphPad Prism 9.0 (GraphPad Software, La Jolla, USA) and R programming language (version 4.3.0).

The landscape of gene-pseudogene dysregulation in BC
To identify the common dysregulated gene and pseudogenes in BC, we performed a differential gene expression analysis on nine GEO datasets (Fig. 1).As a result, we have found 962 common transcripts with significant p values (including 614 genes and 348 lncRNAs) in nine BC datasets (Fig. 2A).Most of the DGEs were found in more than two datasets.Then gene-pseudogene pairs were selected according to the correlation coefficient.We have found 30 gene-pseudogene pairs with significant correlation in BC datasets.GBP1-GBP1P1, PDE4DIP-PDE4DIPP2, and DUSP5-DUSO5P1 have the highest correlation coefficients and best p values compared to other gene-pseudogene pairs (Fig. 2B).

Functional enrichment
Enrichment analyses were performed to investigate the biological function of differentially expressed genes (DEGs) in BC.The results of pathway analysis have shown that significant cancer-related (AKT/mTOR and c-Met pathways) and translation-related mechanisms (eIF4E release, translation factors) were enriched (Fig. 2C and D).These pathways are essential in cell proliferation, indicating their potential role in BC tumorigenesis.According to the GO terms, DEGs are primarily engaged in biological processes (mitochondrial electron transport), cellular components (respiratory chain complex III), and molecular functions (translation initiation factor activity).In molecular function, GO terms confirm the role of cancer-related pathways and translation mechanisms (Fig. 2D).Detailed results are presented in supplementary file 2.

Screening for hub genes
A connection map between GO molecular function terms and genes was constructed to identify the hub genes, which depicts the number of overlapped genes with p-values in each pathway (Fig. 3A).According to this map, the GBP1 has the highest connections with the best p values.Then, the potential prognostic values of DEGs were evaluated (Fig. 2B-D).Based on the survival analysis, the GBP1 showed a significant correlation (HR = 1.26, p-value < 0.0001) to BC prognosis (Fig. 3B and supplementary Figure S1).When we evaluated the co-expression pattern of gene-pseudogene pairs in TCGA, some pairs showed high correlations and significant p-values (Fig. 3C and Supplementary table S2).GBP1 and its pseudogene, GBP1P1, demonstrated the highest correlation coefficient in BC (r = 0.888, p value < 0.0001).However, some gene-pseudogene pairs, for example, DUSP5-DUSP5P1, showed the opposite expression pattern (r = -0.108).
Next, these genes were queried for predictive value to discriminate breast tumors from normal samples using AUC calculation.According to ROC curves (Fig. 3D), GBP1 with AUC = 0.719 and UBQLN4 with AUC = 0.704 had the highest discriminatory power.Then, the same approach was used for chemotherapy responder/nonresponder discrimination (Fig. 3E).Again, GBP1 and UBQLN4 could distinguish chemotherapy responder tumors from non-responder ones with nearly acceptable AUC (0.692 and 0.655, respectively).Detailed ROC curve data are presented in Supplementary file 3.

Construction of ceRNA network
The hub gene-pseudogene pairs were used to screen related miRNAs according to experimentally-validated interactions retrieved from miRNet and StarBase.A matrix plot was created to demonstrate the correlation coefficients and p values of gene-pseudogene-miRNA interactions (Fig. 4A).Finally, with 10 hub gene-pseudogene pairs and 34 associated miRNAs, a ceRNA network was constructed (Fig. 4B).The tripartite ceRNA network shows each player in different columns.
Although only validated gene-miRNA interactions were used, this network needs to be improved.For example, some miRNAs, such as hsa-miR-7-5p and hsa-miR-16-5p, strongly correlate with several genes.However, these miRNAs positively correlate to genes unrelated to the ceRNA mechanism.Furthermore, some genes correlate negatively to several miRNAs, but their pseudogene counterpart does not follow the same pattern (DUSP5 www.nature.com/scientificreports/-DUSP5P1).Therefore, we tried to find the core module of the ceRNA network according to expression patterns and correlation of miRNAs.

The core module of the ceRNA network
The candidate gene-miRNA-pseudogene axes were filtered according to (1) the p-value and size of the correlation between miRNA and gene-pseudogene and (2) the concurrent negative correlation of miRNAs with www.nature.com/scientificreports/gene-pseudogene pairs (Supplementary file 4).As a result, a core module of ceRNA was extracted (Fig. 5A and  B) that contained two genes (GBP1 and PDE4DIP), two pseudogenes (GBP1P1 and PDE4DIPP2), and two miRNAs (hsa-miR-30d-5p and hsa-miR-17-5p).In this interaction network, the hsa-miR-30d-5p is negatively correlated to both GBP1-GBP1P1 and PDE4DIP-PDE4DIPP2 pairs.We compared the correlation of BC patient clinical information (age, menopause status, metastasis, tumor stage, and tumor subclass) to GBP1 and PDE4DIP expression (Fig. 5C).GBP1 shows a more significant correlation to BC patients' characteristics than PDE4DIP.Also, we have previously (Fig. 3) shown that GBP1 has a higher prognostic value than other candidate genes.Therefore, we have finally selected GBP1/ hsa-miR-30d-5p /GBP1P1 axis for experimental validation.

GBP1/hsa-miR-30d-5p/GBP1P1 axis expression in BC patients and cell line
We evaluated the expression pattern of GBP1, hsa-miR-30d-5p, and GBP1P1 in BC samples (Fig. 6A).As expected, the expression of GBP1 and its pseudogene GBP1P1 were significantly higher in tumor samples.The expression of GBP1 and GBP1P1 in tumor tissue was 3.4-fold and 5.7-fold higher, respectively.Furthermore, the expression of hsa-miR-30d-5p was lower in tumor samples compared to adjacent normal breast tissues.When the relative expression of the GBP1/hsa-miR-30d-5p/GBP1P1 axis were compared to patient's pathological data, the expression of these genes was significantly correlated to lymph nodes metastasis, cancer stage and tumor grade (Supplementary Table S3).
When we introduced GBP1P1 into the MCF-7 cell line, GBP1 expression significantly increased in GBP1P1harbouring cells.Also, the level of hsa-miR-30d-5p was decreased after GBP1P1 induction (Fig. 6B).

Overexpression of GBP1P1 improves the tumorigenic properties of MCF-7
The GBP1P1-harbouring MCF-7 cells showed more tumorigenic potential than controls.The viability of GBP1P1-MCF7 cells was higher than control MCF-7 (Fig. 7A).Also, we compared the early apoptosis of cells.Although most cells were alive in both cell lines, the GBP1P1-MCF7 cells had higher early apoptosis, while MCF-7 had higher late apoptosis rates (Fig. 7B).The early apoptosis rate in GBP1P1-MCF7 and MCF7 was 5.93% and 1.01%, respectively.In contrast, the late apoptosis rate was higher in control MCF-7 than in GBP1P1-MCF7 cells (4.31% vs. 2.07%).
In addition, GBP1P1 introduction into MCF-7 improved the wound healing properties (Fig. 7C) and colony formation potential (Fig. 7D) of GBP1P1-MCF7 cells compared to the control.In the wound healing assay, we observed faster wound closure in GBP1P1-MCF7 cells than in MCF-7 (p = 0.03).Moreover, in the colony formation assay, GBP1P1-MCF7 cells had more colony formation potential (p = 0.004).

Discussion
Transcribed pseudogenes are genomic relicts of coding genes with the potential of serving as ceRNAs to regulate the parental gene expression level.There are several studies that explored lncRNA-related ceRNA network in BC 36 .The lncRNAs-mediated ceRNA networks regulate the expression of genes related to proliferation, drug resistance, and apoptosis and promote the development of BC.Although several studies have revealed the role of lncRNA-related ceRNA in BC, a single study (Welch et al.) was conducted on pseudogene-related ceRNA network in BC cancer 37 .
In the present study, we proposed an approach to construct a pseudogene-related ceRNA network in BC.We have found several dysregulated gene-pseudogene pairs in BC datasets.Three gene-pseudogene pairs including, GBP1-GBP1P1, PDE4DIP-PDE4DIPP2, and DUSP5-DUSO5P1 showed significant correlations.Welch et al. 37 also reported two gene-pseudogene pairs including GBP1-GBP1P1 and SUZ12-SUZ12P1 in BC.
KEGG analysis revealed several cancer-related pathways were enriched with gene-pseudogene pairs.These pathways can be categorized into tumor-related pathways (AKT/mTOR and c-Met) and translation-related pathways (eIF4E release and cap-dependent mRNA activation factors).The enriched pathways are critical in BC development and drug resistance [38][39][40][41][42][43][44][45] .Alteration of the AKT/mTOR pathway is a usual event in BC.Approximately 70% of BC cases have distributions in AKT/mTOR pathway 39 .The c-Met pathway is a key regulator of epithelial-to-mesenchymal transition which enhances BC tumor proliferation, survival, motility, and invasion 41 .Translation initiation-related pathways are also important in BC progression.The recruitment of ribosome to mRNA is mediated by eIF4F.eIF4E is upregulated in 50% of BC and promotes tumor formation 43 .
We have extracted the core module from the ceRNA network, including two gene-pseudogene pairs and two miRNAs.The GBP1/ GBP1P1/hsa-miR-30d-5p axis showed significant correlation to clinical information.The expression levels of GBP1 and its pseudogene, GBP1P1, were significantly upregulated in tumor samples.In contrast, the hsa-miR-30d-5p level decreased in the tumor sample.Welch et al. 37 have found a statistically significant reverse correlation between GBP1, GBP1P1, and hsa-miR-199a, which has been demonstrated to control autophagy in BC cells.(B) The rate of early apoptosis was higher in GBP1P1-MCF7 (5.93%), while the rate of late apoptosis was higher in control MCF-7 (4.31%).(C) Wound healing assay.The wound closure rate was higher in GBP1P1-MCF than control (p = 0.0305).(D) Colony formation assay.The higher colony area in GBP1P1-MCF7 cells shows more colony formation potential of GBP1P1-harboring cells.
In conclusion, we report a gene-pseudogene ceRNA network in BC, which may provide new insight into the role of gene-miRNA-pseudogene axes in BC development.

Figure 1 .
Figure 1.The flowchart of the study process.

-Figure 2 .Figure 3 .
Figure 2. The landscape of gene-pseudogene dysregulation in breast cancer.(A) Flower plot diagram showing common differentially expressed genes across 9 breast cancer datasets.(B) gene-pseudogene pairs with the highest correlation coefficients and best p values.(C) and (D) Functional enrichment analysis of the genepseudogene pairs.Several cancer-related pathways and mechanisms were enriched in KEGG pathways and GO terms.

CorrelationFigure 4 .
Figure 4. Construction of competing endogenous RNA (ceRNA) network.(A) Interaction plot of gene-miRNA-pseudogene.The color of the squares shows the correlation coefficient and the size of the squares shows the p-value.(B) Tripartite ceRNA network.

Figure 5 .
Figure 5.Further screening for core ceRNA module.(A) Matrix plot of gene-pseudogene-miRNA axes.The selection was based on the p-value, correlation size, and concurrent negative correlation of miRNA with gene-pseudogene pairs.(B) The core module of the ceRNA network with two gene-pseudogene pairs and two miRNAs.(C) Association between GBP1 and PDE4DIP with clinical information of breast cancer patients.(D) Correlation between GBP1/hsa-miR-30d-5p/GBP1P1 expressions in breast cancer.

Figure 7 .
Figure 7. Tumorigenic properties of GBP1P1-MCF7.When GBP1P1 transfected into MCF7, the tumorigenic potential of GBP1P1-MCF7 increased.(A) The viability of cells was higher in GBP1P1-MCF7 cells (p = 0.004).(B)The rate of early apoptosis was higher in GBP1P1-MCF7 (5.93%), while the rate of late apoptosis was higher in control MCF-7 (4.31%).(C) Wound healing assay.The wound closure rate was higher in GBP1P1-MCF than control (p = 0.0305).(D) Colony formation assay.The higher colony area in GBP1P1-MCF7 cells shows more colony formation potential of GBP1P1-harboring cells.